home
***
CD-ROM
|
disk
|
FTP
|
other
***
search
/
United Public Domain Gold 2
/
United Public Domain Gold 2.iso
/
utilities
/
pu677.dms
/
pu677.adf
/
Dsp3210info.txt
< prev
next >
Wrap
Text File
|
1993-05-13
|
12KB
|
276 lines
DSP3210 Overview
----------------
The DSP3210 is a full 32-bit floating point DSP implemented in .9 micron CMOS.
It provides many advantages over fixed point DSPs such as the Motorola 56000.
Some of the main features of the DSP3210 include:
* 32-bit floating point arithmetic.
* 32-bit addressing.
* Large (8k) on-chip, zero wait-state memory.
* Single cycle instructions (for up to 33 Mflops).
* Share bus with Motorola or Intel style CPU.
* Serial I/O with DMA transfer conters for up to 25 Mbits/second transfer:
Serial data transfers occur without processor intervention.
Cycles are stolen when necessary.
DMA control for serial in and serial out.
* Barrel Shifter for bit manipulation in graphics or data encryption.
* Both mu-law and A-law encoding.
* Bit I/O general purpose 8-bit I/O port for control of external hardware.
* Programmable 32-bit timer for interval timing, rate generation, event
counting or waveform generation.
* Fully vectored interrupt structure with hardware context save:
Allows very fast interrupt processing, up to 2 million/second.
* Low power CMOS design.
No special programming is required on the DSP3210 to implement floating point
algorithms, or to process signals with a much larger dynamic range (in excess
of 1500 dB as opposed to < 300 dB for fixed point). The DSP3210 is also
designed to share a host memory bus with either a Motorola or Intel style CPU.
This greatly reduces system cost by removing the requirement for expensive
fast local memory for the DSP. This also removes any practical restrictions on
program or data size. A large on-chip cache (8k) combined with software that
intelligently utilizes the cache allows the DSP3210 to execute complex signal
processing algorithms without expensive local memory. All instructions execute
in a single cycle (four clock periods> 80 ns for a 50 MHz part or 60 ns for a
66 MHz part) and includes all floating point normalization (which is performed
automatically). A single instruction may have two floating point operations:
a floating point multiplication and a floating point addition. The DSP3210
also supports up to four memory accesses in a single instruction cycle (quad-
word transfer). The DSP3210 architecture features seven functional units:
* Control Arithmetic Unit (CAU)
* Data Arithmetic Unit (DAU)
* On-chip memory (RAM0, RAM1, Boot ROM)
* Bus Interface
* Serial I/O (SIO)
* DMA Controller (DMAC)
* Timer/Status Control (TSC)
The Control Arithmetic Unit
---------------------------
The CAU is responsible for address calculation, branching control and all 16
and 32-bit integer logic and arithmetic operations. It is a RISC core
consisting of a 32-bit Arithmetic Logic Unit (ALU), a 32-bit Program Counter
(PC), 22 32-bit general purpose registers (r0-r22) and a 32-bit barrel
shifter. This core executes instructions at up to 16.7 million instructions
per second. There are special register considerations in the CAU:
r0 hardwired to 0 (always)
r1-r14 DA instruction memory reference (X,Y,Z) pointer registers
r15-r19 DA instruction memory reference (X,Y,Z) increment registers
r20 used by error exception facility to store old pc
r21 stack pointer (sp)
r22 pointer to the exception vector table (evtp)
The CAU provides the following branching and control instructions:
if (COND) goto {N,rB,rB+N} Conditional branch based on flags
if (rM-->=0) goto {N,rB,rB+N} Conditional branch using loop counter
goto {N,rB,rB+N,M,rB+M} Unconditional branch
nop No operation
call {N,rB,rB+N,M}(rM) Call subroutine
return {rM} Return from subroutine
do K,{L,rM} Do next K+1 instruction(s) L+1 or (rM+1)
times. K=0,1,2...127; L=rM=0,1,2...2047
dolock K,{L,rM} Signals interlocked bus transfer
doblock {L,rM} Signals quad-word transfers
ireturn Return from interrupt
sftrst Soft-reset; changes error level to base
level; encoded as spc=(byte)r0
waiti Wait for interrupt; encoded as spc=(long)r0
where: rB = pc, r0-r22
rM = r1-r22
N = 16-bit unsigned integer
M = 24-bit unsigned integer
COND = one of the DSP3210 condition codes (refer to DSP3210 manual)
The Data Arithmetic Unit
------------------------
The DAU consists of a 32-bit floating point multiplier, a 40-bit floating
point adder, four 40-bit floating point accumulators (a0-a3), a clip test
register (ctr), and a control register (dauc). The multiplier and adder
operate in parallel to perform up to 16.7 million computations per second
(12.5 million for a 50 MHz part) of the form (a=b+c*d), also known as a
multiply-accumulate. The DAU contains a four stage pipeline which is visible
to the application programmer. The DAU supports the following floating point
formats:
Single precision (32-bit) in both DSP32 and IEEE format
Extended single precision (40-bit) (uses 8 mantissa guard bits)
Single instruction data type conversions are done in the DAU hardware:
DSP32 and IEEE 32-bit floating point
16/32-bit integer
8-bit unsigned
mu-law and A-law
The DAU has a number of special instructions to greatly simplify data type
conversions and other common operations:
[Z=] aN = ic(Y) Input conversion mu-law, A-law, 8-bit linear to float.
[Z=] aN = oc(Y) Output conversion float to mu-law, A-law, 8-bit linear.
[Z=] aN = float16(Y) 16-bit integer to float.
[Z=] aN = float32(Y) 32-bit integer to float.
[Z=] aN = int16(Y) Float to 16-bit integer (round or truncate, dauc[4]).
[Z=] aN = int32(Y) Float to 32-bit integer (round or truncate, dauc[4]).
[Z=] aN = round(Y) Round to nearest, float(40) to float(32).
[Z=] aN = ifalt(Y) Condidional assignment/memory write.
[Z=] aN = ifaeq(Y) Conditional assignment/memory write.
[Z=] aN = ifagt(Y) Conditional assignment/memory write.
[Z=] aN = dsp(Y) IEEE to DSP format conversion.
[Z=] aN = ieee(Y) DSP to IEEE format conversion.
[Z=] aN = seed(Y) 32-bit to 32-bit reciprocal seed.
Where [Z=] indicates that condition codes may be set. Note that Y may not be
a0-a3 for the dsp() special function.
Addressing Modes
----------------
DSP3210 assembler language exhibits a syntax very similar to 'C'. The notation
conventions are as follows: a0-a3 are the accumulators (DAU), and r0-r22 are
the CAU registers. Instructions take the following appearance:
r2 = (long)r1 ; CAU register direct: store the contents of r1 in r2
r1 = (long)*r1 ; store value pointed to by r1 in r2
r1 = (long)r1 + 1 ; increment r1 by 1
*r2++ = (long)r1 ; postmodify increment r2 after storing r1 there (in *r2)
r3 = (long)r1 + r2 ; add two numbers in r1, r2: store result in r3
r3 = (long)*r1++r2 ; post modify increment r1 by r2: store the result in r3
a2 = a2 + *r2 * a3 ; use that pipeline!
The following table lists the various addressing modes supported by the
DSP3210:
------------------------------------------------------------------------------
Instruction Type
Addressing Mode CA Data CA Data CA Arithmetic/ DA M/A &
Move Group Move Group Logic Group Special Func
(CAU Reg) (I/O Reg)
------------------------------------------------------------------------------
Short Immediate Yes
24-bit Immediate Yes
Memory Indirect Yes
CAU Register Direct Yes Yes Yes
IO Register Direct Yes
DAU Register Direct Yes Yes
Register Indirect Yes Yes Yes
Register Indirect with Yes Yes Yes
Postmodification
------------------------------------------------------------------------------
Latency Issues
--------------
The most difficult aspect of programming the DSP3210 is being aware of latency
in the instruction pipeline. There are four cases in the DAU when the pipeline
affects latency. The cases are:
1. DA Memory Writes. When a DA instruction specifies a write to memory, the
value written is not available to be read from that location until four
instructions later (a three instruction latency). For example:
*r3 = a0 = a0 ; instruction 1
*r3 = a3 = a3 ; instruction 2
. ; instruction 3
. ; instruction 4
a1 = *r3 ; instruction 5
The value read in instruction 5 is the value written in instruction 1, not
instruction 2. Instructions 3 and 4 are latent instructions for instruction 1
and instructions 3, 4 and 5 are latent for instruction 2.
2. Accumulator as Multiplier Input. When an accumulator is used as an input to
the multiplier, its value is established no sooner than three instructions
prior to the multiply instruction (a two instruction latency). Note that
this also applies to an accumulator using the X field of an instruction
of the form:
[Z=] aN = [-]Y {+,-}X
For example:
a0 = a0 + *r1**r2 ; instruction 1
a0 = a0 + a1 ; instruction 2
. ; instruction 3
a2 = a0 * a0 ; instruction 4
a1 = a2 + a0 ; instruction 5
The value of a0 used in instruction 4 is calculated in instruction 1. The
value of a0 used in instruction 5 is calculated in instruction 4 since there
is no latency effect on accumulators used as inputs to the adder.
3. Branching. When a CA Control Group instruction of the form if()goto, call,
return, goto is executed, the instruction immediately following is also
executed before the branch occurs. This is commonly referred to as a
delayed branch. The ireturn instruction is different, and execution of the
base-level program resumes in the following instruction cycle. For example:
if(eq) goto over ; instruction 1
r1 = 3 ; instruction 2
. ; instruction 3
Instruction 2 is executed even if the condition is true and the branch is
taken. If this is undesirable, a nop can be placed after the branch
instruction, or if possible, the instructions can be rearranged. Because of
this latency, a complex situation arises if successive branch instructions are
coded in the following manner:
goto A ; instruction 1
goto B ; instruction 2
A:
.
B:
.
C:
.
The order of execution is instruction 1, instruction 2, A, B. If the
instruction at A is not a goto, execution continues from B. If the instruction
at A is goto C, the order of execution is instruction 1, instruction 2, A, B,
C, and execution continues from C. Successive branch instructions are useful
in some applicationss.
4. Conditional Branching on DAU Conditions. A DAU conditional branch or
conditional arithmetic/logic instruction is established by the last DA
instruction that affects DAU flags no sooner than four instructions prior
to the test (a three instruction latency):
a0 = a0 + a1 ; instruction 1
a2 = a0 * a2 ; instruction 2
. ; instruction 3
. ; instruction 4
if(agt) goto next ; instruction 5
. ; instruction 6 (latent instruction)
The condition tested in instruction 5 is established by instruction 1, not
instruction 2. Because of this latency effect, use the zero-latency ifalt(),
ifagt() and ifaeq() functions where possible (see DA Special Instructions).
The DA condition tested by these conditional accumulator loads is established
by the last DA instruction that affected the DAU flags.
Ref.
"DSP3210 Digital Signal Processor. The Multimedia Solution", Information
Manual. AT&T, September 1991 printing.
"VCOS Multimedia Development Kit, Technical Reference". AT&T Release 1.0,
March 1992 printing.
DSP3210 Support Software Toolkit Manual, Release 1.3
DSP3210 Support Software Library Manual, Release 1.3